£  Copy 


AD-A197  534 

IN  PAGE 


1.  HLPORT  NUM 

TR40 


2.  GOVT  ACCESSION  NO. 


READ  INSTRUCTIONS 

_ BEFORE  COMPLETING  FORM 

3.  RECIPIENT'S  CATALOG  NUMBER 


4.  TITLE  (and  SubflfleJ 

5.  TYPE  OF  REPORT  A  PERIOD  COVERED 

Multi-Order  Calibration 

Technical  Report  -  Interim 

6.  PERFORMING  ORG.  REPORT  NUMBER 

7.  AUTHORf.J 

0-  CONTRACT  or  GRANT  NUMBERf*; 

Eugenio  Sanchez  and  Bruce  R.  Kowalski 

N00014-75-C-0536 

9-  PERFORMING  ORGANIZATION  NAME  AND  ADDRESS 

Laboratory  for  Chemometrics 

Department  of  Chemistry  BG-10 

University  of  Washington 

to.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  8  WORK  UNIT  NUMBERS 

NR  051-565 

11.  CONTROLLING  OFFICE  NAME  AND  AODRESS  1 

Materials  Sciences  Division 

12.  REPORT  DATE 

July  1,  1988 

Office  of  Naval  Research 

13.  NUMBER  OF  PAGES 

16 

14.  MONITORING  AGENCY  NAME  0  ADDRESSfl/  different  from  Controlling  Office) 

15.  SECURITY  CLASS,  (of  thlm  report) 

15«.  DECLASSIFICATION/DOWNGRADING  | 

SCHEDULE  ! 

16-  DISTRIBUTION  STATEMENT  (of  this  Report) 

This  document  has  been  approved  for  public  release  and  sale; 

its  distribution  is  unlimited.  _ 

1 

DTiC 

_ FCTEW 

17.  DISTRIBUTION  STATEMENT  (of  the  abstract  entered  In  Block  JO,  //  d/fferenl  from  tfepor/; 


10.  supplementary  NOTES 

Submitted  and  accepted  for  publication  in  IUPAC  special  report 
on  chemometrics. 


19.  KEY  WORDS  (Continue  on  reverse  aide  If  necessary  and  Identity  by  block  number) 

Bilinear  Multidimensional  Arrays  PCR 

Calibration  Multiorder  GRAM 

Dyadic  Multilinear  Rank  Annihilation 

Multivariate  PLS  Tensor 


PARAFAC 


DD  t  jan  73  1473  EDITION  OF  1  NOV  65  IS  OBSOLETE 
S/N  0102  LF  014  6601 


UNCLASSIFIED 

SECURITY  CLASSIFICATION  OF  THIS  PAGE  (Whtn  Dtl*  Bnfrrd) 


1 


m 

5: 


».v.v.\-.vs 


UNCLASSIFIED  ' 


SECURITY  CLASSIFICATION  OF  THIS  P  AGE(WTi«n  Dal*  Entarad) 


Instruments  that  generate  two-dimensional  arrays  of  data  are  now 
commonplace  in  the  analytical  laboratory,  Time  decay  and  emission- 
excitation  fluorescence,  chromatography-spectroscopy  combinations,  MS-MS 
and  2D-NMR,  are  a  few  of  the  many  so-called  "hyphenated  methods"  that 
generate  such  data.  These  instruments  have  become  very  important  for 
the  analyst  mainly  because  of  their  higher  selectivity  and  resolution  of 
signals,  allowing  for  analysis  of  mixtures.  The  main  similarity  between 
all  these  instruments  is  that  each  sample  analyzed  produces  a  two-dimensional 
array  of  data  (second  order  tensor).  The  amount  of  information  produced 
by  such  an  instrument  is  overwhelming;  for  quantitative  analysis  usually 
only  a  small  portion  of  the  data  is  actually  used  and  the  rest  discarded^ 

The  situation  is  even  worse  if  several  samples  have  to  be  compared, 
because  the  accumulated  data  would  be  a  three-dimensional  array  (third 
order  tensor).  It  is  obvious  that  with  the  standard  statistical  tools 
(e.g.,  univariate  linear  regression)  the  chemist  is  seriously  under-prepared 
to  analyze  these  data.  Even  multivariate  statistical  techniques  are  hard 
pressed  to  analyze  higher  order  data,  and  in  the  best  case,  they  cannot 
fuliv  extract  all  the  information  available. 

^This  paper  will  summarize  a  multi-order,  tensorial  approach  to 
calibration,  that  takes  advantage  of  all  the  information  from  instruments 
that  produce  data  arrays  of  any  order,  for  prediction  of  unknown  properties 
such  as  analyte  concentrations.^  The  types  of  instruments  are  classified 
here  according  to  the  kind  of  an^ay  of  data  produced  per  sample:  some 
instruments  generate  a  single  number  (signal)  per  analysis  or  sample. 

Others  generate  two  or  more  signals  (first  order  instruments),  i.e., 
a  vector  of  signals.  Yet,  other  instruments  can  generate  a  matrix  of 
data  per  sample,  or  a  3-dimensional jarray  (second  and  third-order 
instruments).  This  distinction  will  be  called  the  "order  of  the  instrument", 
in  analogy  with  the  order  of  a  tensor. 

The  "order  of  the  instrument"/ has  special  importance  beyond  the  simple 
fact  of  the  form  of  the  data.  There  are  possibilities  of  analysis  with 
some  higher  order  instruments  whych  are  not  available  for  lower  order 
instruments  -the  multi-order  advantage.  The  simplest  example  is  provided 
by  the  difference  between  a  sincjle  sensor  instrument  (zero  order)  and 
an  array  of  sensors  (first  ordei'):  first  order  permits  quantitation  of 
multicomponent  mixtures  and  detection  of  outlier  unknowns,  which  are  not 
possible  with  zero  order  data.  LC/DA-UV  data  will  be  used  to  illustrate 
how  some  second  order  Instruments  permit  multicomponent  analysis  with 
a  single  calibration  sample. 
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MULTI-ORDER  CALIBRATION 


Eugenio  Sanchez  and  Bruce  R.  Kowalski 
Laboratory  for  Chemometrics,  Department  of  Chemistry  BG-10 
University  of  Washington 
Seattle,  Washington  98195,  U.S.A. 


Instruments  that  generate  two-dimensional  arrays  of  data  are  now  commonplace  in  the 
analytical  laboratory.  Time  decay  and  emission-excitation  fluorescence,  chromatography- 
spectroscopy  combinations,  MS-MS  and  2D-NMR,  are  a  few  of  the  many  so-called 
“hyphenated  methods”  that  generate  such  data.1  These  instruments  have  become  very 
important  for  the  analyst  mainly  because  of  their  higher  selectivity  and  resolution  of 
signals,  allowing  for  analysis  of  mixtures.  The  main  similarity  between  all  these 
instruments  is  that  each  sample  analyzed  produces  a  two-dimensional  array  of  data  (second 
order  tensor).  The  amount  of  information  produced  by  a  second  order  instrument  is 
overwhelming;  for  quantitative  analysis  usually  only  a  small  portion  of  the  data  is  actually 
used  and  the  rest  discarded. 

The  situation  is  even  worse  if  several  samples  have  to  be  compared,  because  the 
accumulated  data  can  be  represented  by  a  three-dimensional  array  (third  order  tensor).  It  is 
obvious  that  with  the  standard  statistical  tools  (e.g.,  univariate  linear  regression)  the 
chemist  is  seriously  under-prepared  to  analyze  these  data.  Even  multivariate  statistical 
techniques  are  hard  pressed  to  analyze  higher  order  data,  and  even  in  the  best  case,  they 
cannot  fully  extract  all  the  information  available. 

This  paper  summarizes  a  multi-order,  tensorial  approach  to  calibration,  that  takes 
full  advantage  of  all  the  information  from  instruments  that  produce  data  arrays  of  any  order, 
for  prediction  of  unknown  properties  such  as  analyte  concentrations.  The  types  of 
instruments  are  classified  here  according  to  the  kind  of  array  of  data  produced  per  sample: 
some  instruments  generate  a  single  number  (signal)  per  analysis  or  sample.  Others 
generate  two  or  more  signals  (first  order  instruments),  i.e.,  a  vector  of  signals.  Yet, 
other  instruments  generate  a  matrix  of  data  per  sample,  or  a  3-dimensional  array  (second 


and  third-order  instruments).  This  distinction  will  be  called  the  order  of  the  instrument,  in 
analogy  with  the  order  of  a  tensor. 

The  order  of  the  instrument  has  special  importance  beyond  the  simple  fact  of  the  form 
of  the  data.  There  are  possibilities  of  analysis  with  some  higher  order  instruments  which 
are  not  available  for  lower  order  instruments.  The  simplest  example  is  provided  by  the 
difference  between  a  single  sensor  instrument  (zero  order)  and  an  array  of  sensors  (first 
order):  first  order  permits  quantitation  of  multicomponent  mixtures  and  detection  of  outlier 
unknowns,  which  are  not  possible  with  zero  order  data. 


Zero  Order  Instruments:  Univariate  Calibration. 

Zero  order  calibration  is  by  far  the  most  common  kind  of  instruments  in  analytical 
chemistry.  Simple  sensors  or  detectors  are  included  in  this  category.  Fig.  la  shows  a 
typical  linear  calibration  experiment.  Several  samples  of  known  concentration  are  used  to 
build  a  model,  and  then  the  model  is  used  to  predict  the  concentration,  c,  of  an  unknown 
from  its  response  r.  Fig.  lb  illustrates  the  same  model  applied  to  a  sample  that  has  an 
interferent.  Clearly,  not  only  the  estimation  of  the  actual  concentration  of  the  analyte  is 
impossible,  but  it  is  also  impossible  to  detect  the  interferent.  The  importance  of  this  simple 
fact  cannot  be  over-emphasized;  much  of  an  analytical  chemist's  time  is  spent  ensuring  that 
the  sample  is  pure,  without  interferences,  or  finding  a  measurement  technique  that  only 
responds  to  the  analyte  of  interest  (fully  selective). 


First  Order  Instruments:  Multivariate  Calibration. 

A  multichannel  spectrometer  or  an  array  of  sensors  constitutes  a  first  order  instrument. 
The  numerical  values  of  the  responses  of  a  sensor  array  with  p  sensors  can  be  arranged  as 
the  components,  r,-,  of  a  vector  r.  Therefore,  this  sensor  array  response  “spectrum”  can 
be  considered  a  vector  in  a  multidimensional  vectorial  space,  where  the  base  vectors  of  the 
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space  are  the  unitary  responses  for  each  sensor,  and  the  components  are  the  actual 
numbers  that  the  sensor  array  has  provided. 

The  problem  of  linear  multivariate  calibration  consists  on  finding  a  linear  combination 
of  the  instrument  responses  optimal  for  prediction  of  the  analyte  concentration  in  the 
sample.  To  estimate  this  optimal  weighting  of  the  responses,  the  p  responses,  r,-,  from  a 
set  of  samples  of  known  concentration  are  recorded  and  used  for  the  prediction  of  analyte 
concentrations,  £u,  for  future,  unknown  samples, 

Cu  =  X  n  x •  (1) 

i=l 

It  can  shown  that  the  optimal  set  of  weights  for  prediction  are  a  vector  x*  which  must 
be  perpendicular  to  the  spectra  of  all  other  analytes  and  interferences  present  in  the 
unknown  sample.  In  other  words,  it  is  the  contravariant  component  of  the  analyte 
spectrum,2  or  net  analyte  signal  vector.3  A  general  solution  to  the  problem  of  finding  x’ 
given  a  calibration  set  of  response  vector  -  concentration  pairs  {r,- ,  c,-  },  ordered  into  a 
matrix  R  and  a  vector  c,  is 


x*  At 

x  =  R  c 


where  RT  is  an  estimated  pseudoinverse  of  R.  Many  different  methods  have  been 
developed  to  estimate  the  optimal  pseudoinverse  for  prediction,  among  them  principal 
components  regression4  (PCR)  and  partial  least  squares5  (PLS)  calibrations  have  been 
extensively  studied  in  the  recent  chemometrics  literature. 

Fig.  2  illustrates  an  example  of  multivariate  calibration.  Two  kinds  of  interferences  are 
possible  with  a  first  order  instrument: 

(1)  Fig.  2a.  Interferences  present  both  in  the  calibration  set  and  in  the  unknown  sample: 
component  B  was  present  in  the  calibration  samples,  even  though  its  concentration  was 
unknown.  Multivariate  calibration  can  still  accurately  predict  the  concentration  of 
component  A  in  the  presence  of  B. 
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(2)  Fig.  2b.  Interferences  or  background  constituents  present  only  in  the  unknown 
sample:  component  C  was  not  present  in  the  calibration  samples,  therefore,  no  good 
estimate  of  the  concentration  of  component  A  is  possible.  But  it  is  possible  to  detect  this 
sample  as  an  outlier  to  our  model,  because  C  is  not  in  the  space  spanned  by  A  and  B. 

Second  Order  Instruments:  Second  Order  Calibration. 

Instruments  that  generate  two-dimensional  arrays  of  data  (second  order  instruments)  are 
now  commonplace  in  the  analytical  laboratory.  In  chemistry,  the  normal  way  to  handle 
this  kind  of  data  has  been  to  choose  from  the  array  a  single  element  which  is  unique  for  the 
analyte  of  interest,  discarding  or  perhaps  not  collecting  the  rest  of  the  data.  For  example, 
in  MS-MS,  is  often  possible  to  find  daughter  ions  which  are  completely  unique  for  one 
analyte  of  a  mixture.  For  an  emission-excitation  matrix  (EEM),  it  is  sometimes  possible  to 
find  a  combination  of  excitation  and  emission  wavelengths  for  which  only  the  analyte  of 
interest  has  a  significant  signal.  This  is  a  valid  approach  when  the  analyst  knows  that  the 
signal  being  used  is  unique,  just  like  in  univariate  calibration,  however  it  does  not  take 
advantage  of  all  the  information  available. 

It  is  also  possible  to  unfold  the  data  into  a  vector  as  it  has  been  suggested  by  Wold  and 
coworkers,6  and  then  use  multivariate  techniques  such  as  PLS  for  calibration  and  data 
analysis.  Unfortunately,  when  breaking  a  second  order  array  of  data  into  a  first  order 
array,  e.g.,  by  separating  the  columns  of  the  matrix  into  a  long  column  vector,  the 
relationship  of  the  rows  is  lost  in  the  process.  For  some  kinds  of  data,  this  may  not  cause 
problems  because  that  relation  may  be  unimportant,  but  for  bilinear  data  arrays,  such  as 
LC/UV  or  EEM,  unfolding  produces  a  drastic  lost  of  information. 

Assuming  a  linear  model,  the  response  of  a  second  order  instrument  to  a 
multicomponent  sample  M  should  be  approximately  equal  to  a  linear  combination  of  the 
responses  of  all  the  individual  analytes  present  in  the  sample,  N;,  plus  error,  E: 
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M  =  ^  c,  N;-  ■+■  E  (3) 

i-/ 

where  the  matrices  N(-  have  been  scaled  such  that  the  coefficients  cL  are  the  corresponding 

concentrations.  If  the  matrices  N;  and  M  have  rank  approximately  equal  to  1  and  q 

respectively  (Bilinear  data),  and  the  data  matrix  N*  for  a  particular  analyte  is  known,  it  is 

possible  to  show  that  c*  can  be  estimated  in  equation  3  for  an  unknown  M,  by  using7-8 

1  J 

y°k  =  £  (4) 

i-1  j-1 

where  (Af^)y  is  the  element  of  a  pseudoinverse  of  M,  corresponding  to  the  ith  row  and  the 
jth  column  of  M^,  and  the  subscript  k  has  been  dropped  from  Ny  for  simplicity.  Equation 
4  implies  that  a  single  calibration  sample  with  the  analyte  of  interest  is  sufficient  for 
estimating  the  concentration  of  the  analyte  in  an  unknown  sample.9 

If  a  multicomponent  calibration  sample  is  used,  with  some  analytes  at  known 
concentration,  e.g.,  N,  it  is  possible  to  determine  the  concentration  of  all  those  analytes 
in  an  unknown  sample,  together  with  their  individual  spectra.  Fig.  3  illustrates  this  with 
an  example:  The  bilinear  LC/UV  data  from  (1)  a  two-component  calibration  sample,  and 
(2)  a  test  (“unknown”)  sample  with  3  constituents,  are  collected.  Applying  second  order 
bilinear  calibration  methods,10  (i)  the  UV  spectra,  (ii)  the  chromatographic  concentration 
profiles,  and  (iii)  the  ratios  of  concentration  (calibration/unknown)  are  obtained.  This  is 
possible  using  the  non-symmetrical  eigenvalue-eigenvector  equation10 

(N  Mf)  X  =  X  X  (5) 

where  M*  is  the  pseudoinverse  of  the  unknown  sample  data  matrix,  X  are  the  eigenvectors 
(pure  spectra  in  one  of  the  orders)  and  X  is  a  diagonal  matrix  of  eigenvalues,  which  are  the 
ratio  of  concentrations,  between  the  calibration  sample  and  the  unknown  sample,  of  the 
analytes  in  common.  More  complete  details  are  given  elsewhere.9-10-11 
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It  is  also  possible  to  use  several  calibration  samples  instead  of  one,  and  in  many  cases 
desirable,  i.e.,  for  covering  a  wider  dynamic  range,  reduce  the  effect  of  collinearities  or 


increase  the  precision  of  the  predicted  concentrations.8 


Conclusion. 

Many  aspects  of  calibration  have  been  omitted  from  this  discussion  to  focus  in  the  order 
of  the  instrument  Among  them,  non-linear  responses,  outlier  detection,  precision  and 
accuracy,  sampling,  sample  selection,  variable  selection,  experimental  design,  and  time 
dependence  of  the  responses.  These  factors  are  well  studied  only  for  the  case  of  zero  order 


calibration.  The  multivariate  regr 


i  literature  provides  a  good  starting  point  for  first 


order  calibration,  but  few  papers  have  been  published  that  address  these  issues  for 
calibration12.  For  second  and  higher  order  calibration,  no  work  has  been  done,  with  very 
few  exceptions.13  These  issues  represent  an  important  challenge  for  the  chemometrician, 
and  we  expect  impci  Lout.  developments  in  these  uveas  in  the  near  future. 
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